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ABSTRACT 

Post-translational modifications (PTMs) are involved 
in the regulation and structural stabilization of 
eukaryotic proteins. The combination of individual 
PTM states is a key to modulate cellular functions 
as became evident in a few well-studied proteins. 
This combinatorial setting, dubbed the PTM code, 
has been proposed to be extended to whole prote- 
omes in eukaryotes. Although we are still far from 
deciphering such a complex language, thousands of 
protein PTM sites are being mapped by high- 
throughput technologies, thus providing sufficient 
data for comparative analysis. PTMcode (http:// 
ptmcode.embl.de) aims to compile known and pre- 
dicted PTM associations to provide a framework 
that would enable hypothesis-driven experimental 
or computational analysis of various scales. In its 
first release, PTMcode provides PTM functional as- 
sociations of 13 different PTM types within proteins 
in 8 eukaryotes. They are based on five evidence 
channels: a literature survey, residue co-evolution, 
structural proximity, PTMs at the same residue and 
location within PTM highly enriched protein regions 
(hotspots). PTMcode is presented as a protein- 
based searchable database with an interactive 
web interface providing the context of the 
co-regulation of nearly 75000 residues in >10000 
proteins. 

INTRODUCTION 

Most eukaryotic proteins are targeted by a multitude of 
post-translational modifications (PTMs) that fine-tune 



their function as a rapid response to stimuli without 
involvement of genomic, transcriptomic or translational 
regulation. These PTMs are present in various types and 
combinations, and their on-off status can vary during the 
hfe time of a protein, thereby fine-tuning its function, 
localization and interaction with other molecules. 
Specific mechanisms have been associated to particular 
pairs of PTM types, Hke the competition for serine and 
threonine residues by phosphorylation and O-linked 
glycosylation (1,2) or the promotion of ubiquitination by 
phosphorylation that leads to protein degradation (3), and 
there are extensive studies describing the regulation by 
PTM interplay of individual proteins such as the tumor 
suppressor p53 (4) or the well known compilation of 
molecular switches that occur within histone tails (5), sug- 
gesting the existence of a 'PTM code' (6-9) based on the 
presence and the association of several PTMs that leads to 
perform a particular function (10). Most of the studies 
trying to decipher this molecular barcode are based on 
single proteins and few PTM types (4,11); however, 
thanks to the recent technological advances in the mass 
spectrometry-based detection methods (12), an increasing 
amount of data about protein modifications is becoming 
available, increasing the diversity and abundance of 
reported PTMs, although it is far from being complete. 
Yet, deciphering a potential PTM code remains a difficult 
challenge, as the collection of the parts lists, the 
PTMs repertoire, is only a first step, and individual 
PTMs have to be functionally associated, somewhat 
analogous to the dehneation of proteomes whereby indi- 
vidual proteins are involved in complex protein-protein 
interactions. 

Only recently, several studies have explored the associ- 
ation of several PTM types in whole proteomes based on, 
for instance, the study of acetylation status by the system- 
atic perturbation of kinases (13), structural changes in 
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modified residues in response to in-silico perturbations of 
other modification sites (14), the competition of several 
PTMs for a residue (15), the presence of clusters of 
PTMs within the protein sequence (16) or the co-evolution 
of the modified residues, where pairs of particular PTM 
types were associated to specific protein localizations, 
functions and protein functional units such as short 
Hnear motifs and globular domains (17). The latter being 
the first attempt to characterize the interplay between a 
large number of PTM types in several eukaryotes. Based 
on these recent independent developments, one should be 
able to derive coherent functional associations between 
modified residues, thus adding value to information 
about individual modifications, stored in classical 
protein resources (18) or PTM specific databases 
(19-22), again analogous to protein sequence and 
protein-interaction databases. A functional association 
between two PTMs should be seen here as a broad 
concept that not only stands for a physical interaction 
or a competition (PTM crosstalk) but also describing 
more broad associations, i.e. PTMs that are not present 
in the protein at the same time but involved in the same 
protein function. 

The PTMcode database combines results from several 
large-scale analyses that identify known and predicted 
functionally associated PTMs of 13 different types from 
8 eukaryotes. As the first large-scale public database 
providing this kind of information, we believe that 
PTMcode wiU enable both computational and molecular 
biology laboratories to further PTM research in many 
ways and at various scales, ranging from individual mech- 
anistic studies to global network analyses towards a global 
PTM code. 



RESULTS 

Available PTMs and their functional associations 

We extracted all experimental vahdated PTMs available at 
UniProt (18), PHOSIDA (19), PhosphoSite (20), 
PhosphoELM (21), O-GlycBase (22), dbPTM (23) and 
HPRD (24) and performed a pre-processing task to 
avoid redundancy and non-matching modifications. 
First, protein Ids from the sources were converted to a 
reference Id, taken from the STRING database (25), 
and second, the modified residues were required to 
match the amino acids in a reference sequence, taken 
from the eggNOG database (26) which in turn, fetches 
the longest protein isoform from the Ensembl database 
(27). In total, we integrated 136258 experimentally 
determined, non-redundant PTMs of 13 different types 
in 25 765 proteins of 8 different eukaryotes {Homo 
sapiens, Mus musculus, Rattus norvegicus. Bos taurus, 
Gallus gallus, Drosophila melanogaster, Caenorhabditis 
elegans and Saccharomyces cerevisiae). 

We combine five different channels for the extraction of 
known or predicted functional association between PTMs 
(Figure 1): (i) sites that co-evolve across many eukaryotes 
as described in (17); (ii) PTMs associated based on their 
proximity in the protein structure, as modified sites that 
tightly co-operate seems to be clustered (4,28); (iii) PTMs 
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Figure 1. PTMcode integrates five types of evidence cliannels to collect 
known and predicted functional associated PTMs within the same 
protein, named in here as: (A) 'Co-evolution' where two modified 
residues are found to be significantly co-evolving across eukaryotes. 
The multiple sequence alignment in the right corresponds to the 
orthologous group of the protein modified by the two acetylated 
lysines shown. (B) 'Structural distance' where two modified residues 
are found closer than a threshold (based on the distance of known 
PTM interactions). (C) 'Same residue' where the two modifications 
target the exact same residue, for instance, serine 2124 of the mouse 
protein Bsn can be modified either by phosphorylation or O-hnked 
glycosylation. (D) 'Manual annotation', derived from a literature 
survey in order to cover known associated PTMs. (E) 'PTM 
hotspots' that represent protein regions enriched in modifications, the 
figure shows a region of 60 amino acids where 18 of them can be 
modified. 



known to modify the same residue in the protein sequence; 
(iv) a manual annotation survey to identify known PTM 
crosstalks in the literature; and (v) PTMs that are located 
within PTM 'hotspots', significant high-density modified 
regions within the protein sequence (16). From the four 
channels that assign pairwise associations (co-evolution, 
structural distance, PTMs modifying the same residue 
and manual annotation), PTMcode holds 401 690 
distinct functional associations describing the co-regu- 
lation of 74 839 residues in 10410 proteins. In addition, 
7400 residues have been extracted from 1635 computed 
regions with high PTM density. More details on the 
PTM associations content are in Table 1. 

PTMs in the context of protein domains and structures 

PTMcode is not a resource aiming at PTM collections, but 
instead for exploration, retrieval and analysis of predicted 
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Table 1. Number of functional associations that are predicted by 
each of the evidence channels and those that at least have two differ- 
ent evidences, only considering the four types of evidences that assign 
concrete pairwise association (co-evolution, structural distance, same 
residue and manual annotation) 
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and known functional association between PTMs within 
proteins. Yet, we map PTMs onto curated protein 
domains and unstructured regions as defined by the 
database SMART (29). PTMs become then part of the 
functional annotation, and their associations can be seen 
as part of the whole regulation of protein domains. We 
classify PTMs into three categories: 'regulatory' 
(those involved in regulation of protein function), 
'stabilizing' (those that are not involved in regulation of 
function but required for conformational purposes) and 
'uncharacterized' (those with unknown or unclear 
function, as in the case of C-linked glycosylations). In 
addition, we added, when applicable, a view of the three 
dimensional structure of the domain or the entire protein, 
highlighting any two modified residues that are predicted 
to be functionally associated by any of our five source 
channels. The structure is displayed using the popular 
Java viewer Jmol (http://www.jmol.org) in an interacting 
way and can be explored by using all Jmol features. 
Information about modifying enzymes (such as protein 
kinases) is also provided by extracting it from several 
sources (19-21,24). 

PTMcode also supplies a score for each PTM, the 
relative Residue Conservation Score (rRCS), that 
measures the conservation of the modified residue over 
the oldest eukaryotic orthologous group where the 
protein in present and takes into account both the conser- 
vation of the residue within the orthologous proteins and 
the evolutionary distance between the species with the 
conserved residue; for fuU details on rRCS algorithm 
and performance see (17). An rRCS >95 means that the 
modified residue is more conserved than the 95% of the 
same amino acids within the same type of protein region. 
The rRCS can be used as a proxy to hint at PTM func- 
tionality in the absence of other data. Conservation has 
been used before for the same purpose (30-32), although 
caution is required in its interpretation (33). Yet, several 
other PTM databases provide simple information about 
protein and residue conservation to be used as a filter for 
functional sites (19,20). 

Co-evolution 

We used the co-evolution of two modified residues to 
predict their functional association as described in (17). 
This strategy showed already that co-evolving pairs of 



different PTM types can be specifically linked in proteins 
with certain functionahties, localizations and can even po- 
tentially co-regulate protein interactions through their as- 
sociation to particular protein domains and short linear 
motifs (17). The functional associations provided by this 
prediction channel should be considered from a broad 
perspective, as they can range from physical interactions 
(as seen by the overlap with pairs of PTMs found close in 
the protein structure) to their participation in the same 
protein functionality although not necessarily at the 
same stage. The species where both amino acids are 
conserved over the protein orthologous groups are 
shown in the co-evolution pop-up window in which the 
protein alignment with the respective columns highlighted 
can be visualized using Jalview (34). 

Structural distance 

A straightforward mechanism of two PTMs to be 
associated is based on their proximity (4,28), measured 
here using the 3D structure of the protein. If they are 
close enough, they could be either competing for the 
same space, i.e. methylation inhibiting the phosphoryl- 
ation of adjacent serines (35), or co-operating in the regu- 
lation of the same protein region [i.e. the highly modified 
cassette of amino acids in p53 (4)]. We mapped PTM 
residues to three-dimensional structures of proteins from 
the Protein Data Bank (36) and calculated the spatial 
distance between pairs of modified residues. To delineate 
a first estimation for an appropriate distance to conclude 
physical interaction, we measured the average distance for 
12 pairs of associated modifications reported in the 
Hterature to physically interact. Thus, modified residues 
closer than 4.69 A are predicted to either be physically in 
contact or being mutually exclusive competing for the 
same protein niche; their conformation can be visualized 
using the Jmol plugin. 

PTMs modifying the same residue 

The simplest evidence for a direct crosstalk of two PTMs 
is a modification of the same residue in the protein 
sequence, which would reflect either that they compete 
for the same amino acid (mutually exclusive PTMs) or 
that they co-operate for the same function if the modifi- 
cations happen sequentially in time. Two well-known as- 
sociations between PTM types are described to follow a 
competition strategy. Phosphorylation and O-linked 
glycosylation modify serine and threonine amino acids 
and constitute molecular switches that co-regulate 
protein function and localization within the so-called 
yin-yang sites (1,2). The promiscuous amino acid lysine 
can be acetylated, SUMOylated, ubiquitinated and 
methylated, and it has been described to be co-regulated 
by several PTM types at the same position, for example, 
during the regulation of histone tails (5). 

We identified 576 residues regulated by this channel, 
mostly between the above reported cases but also 
between other pairs of PTM types [i.e. 10 instances of 
hydroxylated and O-hnked glycosylated lysines that 
happen sequentially in time in collagen proteins (37) or 
7 instances between phosphorylation and sulfation]. 
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PTMcode provides a tentative annotation for the pairs of 
associated PTMs predicted by this channel, classifying 
them as 'competing', 'co-operating' or 'uncharacterized' 
associations. 



Manual annotation 

PTMcode not only holds predicted associations but also 
PTM sites that are reported in the literature to crosstalk. 
They were extracted using the literature review about 
PTM types interplay in Minguez et al. (17) and are now 
introduced in the database after their mapping into the 
correct protein sequences. Links to the scientific articles 
through PubMed and a short description on the mechan- 
ism of action are provided for the 57 associations found 
using this channel. 

Hotspots 

PTMs can also be part of regulatory hotspots, small 
regions in the protein sequence that are enriched in modi- 
fications (4). Such regions have been recently defined (16), 
and rules have been estabhshed and benchmarked. For 
example, modified lysines are more probably located 
within a distance of 15 amino acids to a phosphorylated 
residue therefore forming hotspot regions where PTMs 
tend to cluster. According to the estabhshed rules, for 
each of the modified residues in a protein, we define a 
window of 31 amino acids (15 downstream and 15 
upstream), count the number of modifications there and 
compared them using a Fisher exact test to the number of 
modifications in the whole protein. All resultant P-values 
were adjusted by False Discovery Rate, and overlapping 
regions were collapsed to give a total of 1635 hotspots that 
are visible and explorable within the PTMcode web 
interface. 



PTMcode web interface. Query, results and availability 

PTMcode is accessible through the url http://ptmcode. 
embl.de. The web interface provides a browser to access 
all proteins that have some known or predicted functional 
associations between PTMs (Figure 2) and a search engine 
where the user can introduce a protein sequence or any 
protein id. In addition, the user can restrict the search to a 
particular residue or protein region. PTMcode is a 
protein-oriented database, as one of the major motivation 
for the resource is to help experts of particular proteins or 
research fields to explore functional hypotheses. 
A flash-based graphic interface enables intuitive 
interactivity. A single or many functional associations 
can be explored, and supporting information (ahgnment, 
structure etc) for each of the five channels can be easily 
called, all within the context of the protein domain 
architecture with links to the respective SMART entries. 
Figure 3 shows an overview of the predicted 
functional co-regulation by several post-translational 
modifications of a protein within the PTMcode 
environment. 
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Figure 2. The PTMcode database uses in its first release 13 different 
types of PTMs that are abbreviated in a two letter code as: Ph (phos- 
phorylation), NG (N-linked glycosylation), Ac (acetylation), OG 
(O-linked glycosylation), Ub (ubiquitination). Me (methylation), SM 
(SUMOylation), Hy (hydroxylation), Ca (carboxylation). Pa 
(palmitoylation), Su (sulfation), Ni (nitrosylation) and CG (C-linked 
glycosylation). The known and predicted functional associations 
between two of these types of modifications are accessible by selecting 
any two PTM types in an interactive network that offers all possible 
connections (A). These connections are constrained by the fact that 
both modifications at least should happen in the same protein. The 
PTM types are represented here by symbols, the size of which corres- 
ponds to their relative abundance in the database. The link widths 
represent the number of proteins modified by the two respective 
PTM types normalized by the total number of proteins harbouring 
the less abundant PTM type. The width thus indicates relative 
coverage of the particular pair-wise functional associations. For 
instance, based on the link widths, we see that a large proportion of 
proteins that are nitrosylated are also phosphorylated, while the pro- 
portion of proteins that are both hydroxylated and phosphorylated is 
smaller. Upon selection in the WEB interface, two types of modifica- 
tions are activated, and all the pairs of known and predicted functional 
association in all proteins are shown in a table (B) where each of the 
entries can be further explored. These tables are available for download 
as text files. 

Conclusion and future plans 

PTMcode is a unique database in that it goes beyond mere 
compilation of PTMs, which is already covered by useful 
resources either detailed for the single PTM type (21) or 
for various PTM types (19,20,23). The aim is to put these 
PTMs into context, and focus of the first PTMcode release 
is to capture known and predicted functional associations 
between PTMs within proteins. Other databases start to 
implement functional context of the individual PTMs 
providing for instance, conservation information, 
mapping PTMs into proteins domains or 3D structures. 
Ptmfunc (16) for example provides information about the 
regulation of binding interfaces, protein domains and even 
the presence within a hotspot, although it lacks of a graph- 
ical display of the protein. PTMcode goes a step further 
and is PTM association-centric, aiming at a description of 
a co-regulation landscape of a protein by means of its 
modified sites. In the future, we plan to develop 
PTMcode further by introducing more PTMs, more 
accurate prediction methods and to extend to associations 
between proteins in order to prepare the ground for 
deciphering the global PTM code. 
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Figure 3. PTMcode offers the exploration of post-translational regula- 
tion within thousands of proteins. (A) Interactive grapliical display of 
functional associations between PTMs within the human EGF receptor 
(EGFR). The protein is represented by the grey line at the top with 
globular domains and unstructured regions taken from the database 
SMART. Below this, PTM hotspots are shown as red lines (in 
EGFR, three hotspots were identified). At the bottom, the different 
PTMs are mapped and by clicking at one, the functional associations 
with any other PTM are shown by arches coloured according to the 
five evidence channels. All functional associations for a particular PTM 
can be further explored from a table (B) that is interactively displayed. 
The conservation score for the PTMs that are predicted to be 
associated is listed in the table together with all the evidences that 
support the prediction. Clicking on each of the evidences will show a 
pop-up with more detailed information, for example, alignments or 
mapping of PTM pairs onto three dimensional structures. 
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